71 research outputs found

    Robust regularized singular value decomposition with application to mortality data

    Get PDF
    We develop a robust regularized singular value decomposition (RobRSVD) method for analyzing two-way functional data. The research is motivated by the application of modeling human mortality as a smooth two-way function of age group and year. The RobRSVD is formulated as a penalized loss minimization problem where a robust loss function is used to measure the reconstruction error of a low-rank matrix approximation of the data, and an appropriately defined two-way roughness penalty function is used to ensure smoothness along each of the two functional domains. By viewing the minimization problem as two conditional regularized robust regressions, we develop a fast iterative reweighted least squares algorithm to implement the method. Our implementation naturally incorporates missing values. Furthermore, our formulation allows rigorous derivation of leave-one-row/column-out cross-validation and generalized cross-validation criteria, which enable computationally efficient data-driven penalty parameter selection. The advantages of the new robust method over nonrobust ones are shown via extensive simulation studies and the mortality rate application.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS649 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Functional singular value decomposition and multi-resolution anomaly detection

    Get PDF
    This dissertation has two major parts. The first part discusses the connections and differences between the statistical tool of Principal Component Analysis (PCA) and the related numerical method of Singular Value Decomposition (SVD), and related visualization methods. The second part proposes a Multi-Resolution Anomaly Detection (MRAD) method for time series with long range dependence (LRD). PCA is a popular method in multivariate analysis and in Functional Data Analysis (FDA). Compared to PCA, SVD is more general, because it not only provides a direct approach to calculate the principal components (PCs), but also simultaneously yields the PCAs for both the row and the column spaces. SVD has been used directly to explore and analyze data sets, and has been shown to be an insightful analysis tool in many fields. However, the connection and differences between PCA and SVD have seldom been explored from a statistical view point. Here we explore the connections and differences between PCA and SVD, and extend the usual SVD method to variations including different centerings based on various types of means. A generalized scree plot is developed to provide a visual aid for selection of different centerings. Several matrix views of the SVD components are introduced to explore different features in data, including SVD surface plots, image plots, rotation movies, and curve movies. These methods visualize both column and row information of a two-way matrix simultaneously, relate the matrix to relevant curves, and show local variations and interactions between columns and rows. Several toy examples are designed iii to compare the different types of centerings, and three real applications are used to illustrate the matrix views. In the field of Internet traffic anomaly detection, different types of network anomalies exist at different time scales. This motivates anomaly detection methods that effectively exploit multiscale properties. Because time series of Internet measurements exhibit long range dependence (LRD) and self-similarity (SS), the classical outlier detection methods base on short-range dependent time series may not be suitable for identifying network anomalies. Based on a time series collected at a single scale (the finest scale), we aggregate to form time series of various scales, and propose a MRAD procedure to find anomalies which appear at different time scales. We show that this MRAD method is more conservative than a typical outlier detection method based on a given scale, and has larger power on average than any single scale outlier detection method based on some reasonable assumptions. Asymptotic distribution of the test statistic is developed as well. An MRAD map is developed to show candidate anomalies and the corresponding significance probabilities (p values). This method can be easily extended to be implemented in real time. Simulations and real examples are reported as well, to illustrate the usefulness of the MRAD method. Keywords: Principal Component Analysis, Functional Data Analysis, Exploratory Data Analysis, Network Intrusion Detection, Outlier detection, Level Shift, Multiscale analysis, Long Range Dependence, Multiple Comparison, p values, Time Series, false discovery rate
    • …
    corecore